A scaling-invariant algorithm for linear programming whose running time depends only on the constraint matrix††thanks: Supported by the ERC Starting Grants ScaleOpt and QIP.
Abstract
Following the breakthrough work of Tardos (Oper. Res. ’86) in the bit-complexity model, Vavasis and Ye (Math. Prog. ’96) gave the first exact algorithm for linear programming in the real model of computation with running time depending only on the constraint matrix. For solving a linear program (LP) , Vavasis and Ye developed a primal-dual interior point method using a ‘layered least squares’ (LLS) step, and showed that iterations suffice to solve (LP) exactly, where is a condition measure controlling the size of solutions to linear systems related to .
Monteiro and Tsuchiya (SIAM J. Optim. ’03), noting that the central path is invariant under rescalings of the columns of and , asked whether there exists an LP algorithm depending instead on the measure , defined as the minimum value achievable by a column rescaling of , and gave strong evidence that this should be the case. We resolve this open question affirmatively.
Our first main contribution is an time algorithm which works on the linear matroid of to compute a nearly optimal diagonal rescaling satisfying . This algorithm also allows us to approximate the value of up to a factor . This result is in (surprising) contrast to that of Tunçel (Math. Prog. ’99), who showed NP-hardness for approximating to within . The key insight for our algorithm is to work with ratios of circuits of —i.e., minimal linear dependencies —which allow us to approximate the value of by a maximum geometric mean cycle computation in what we call the ‘circuit ratio digraph’ of .
While this resolves Monteiro and Tsuchiya’s question by appropriate preprocessing, it falls short of providing either a truly scaling invariant algorithm or an improvement upon the base LLS analysis. In this vein, as our second main contribution we develop a scaling invariant LLS algorithm, which uses and dynamically maintains improving estimates of the circuit ratio digraph, together with a refined potential function based analysis for LLS algorithms in general. With this analysis, we derive an improved iteration bound for optimally solving (LP) using our algorithm. The same argument also yields a factor improvement on the iteration complexity bound of the original Vavasis-Ye algorithm.
1 Introduction
The linear programming (LP) problem in primal-dual form is to solve
| (LP) |
where , , , are given in the input, and , are the variables. We consider the program in to be the primal problem and the program in to be the dual problem.
Khachiyan [Kha79] used the ellipsoid method to give the first polynomial time LP algorithm in the bit-complexity model, that is, polynomial in the bit description length of . Following Khachiyan’s work, the now forty year old open question is whether there exists a strongly polynomial time algorithm for LP. The task is to solve LP using basic arithmetic operations. Furthermore, the algorithm must be in PSPACE, that is, the numbers occurring in the computations must remain polynomially bounded in the input size. Known strongly polynomially solvable LP problems classes include: feasibility for two variable per inequality systems [Meg83], the minimum-cost circulation problem [Tar85], the maximum generalized flow problem [Vég17, OV17], and discounted Markov decision problems [Ye05, Ye11].
For more general LP classes, for which strongly polynomial algorithms are not known, the principal line of attack has been to reduce the numerical complexity of LP algorithms. More precisely, the goal has been to develop algorithms whose number of arithmetic operations depend on natural condition measures of the base LP; at a high level, these condition measures attempt to finely measure the “intrinsic complexity” of the LP. An important line of work in this area has been to parametrize LPs by the “niceness” of their solutions (e.g. the depth of the most interior point), where relevant examples include the Goffin measure [Gof80] for conic systems and Renegar’s distance to ill-posedness for general LPs [Ren94, Ren95], and bounded ratios between the nonzero entries in basic feasible solutions [Chu14, KM13].
Parametrizing by the constraint matrix
A second line of research, and the main focus of this work, makes no assumptions on the “niceness” of solutions and instead focuses on the complexity of the constraint matrix . The first breakthrough in this area was given by Tardos [Tar86], who showed that if has integer entries and all square submatrices of have determinant at most in absolute value, then (LP) can be solved in time poly. This is achieved by finding the exact solutions to rounded LPs derived from the original LP, with the right hand side vector and cost function being integers of absolute value bounded in terms of and . From such rounded problem instances, one can infer, via proximity results, that a constraint must be valid for every optimal solution. The process continues by induction until the optimal primal face is identified.
Path-following methods and the Vavasis-Ye algorithm
In a seminal work, Vavasis and Ye [VY96] introduced a new type of interior-point method that optimally solves (LP) within iterations, where the condition number controls the size of solutions to certain linear systems related to the kernel of (see Section 2 for the formal definition).
Before detailing the Vavasis-Ye (henceforth VY) algorithm, we recall the basics of path following interior-point methods. If both the primal and dual problems in (LP) are strictly feasible, the central path for (LP) is the curve defined by
| (CP) | ||||
which converges to complementary optimal primal and dual solutions as , recalling that the optimality gap at time is exactly . We thus refer to as the normalized dualized gap. Methods that “follow the path” generate iterates that stay in a certain neighborhood around it while trying to achieve rapid multiplicative progress w.r.t. to , where given close to the path, we define the effective as . In general, the direction of movement at each iteration is computed by solving a carefully chosen linear system. Given a target parameter and starting point close to the path at parameter , standard path following methods [Gon92] can compute a point at parameter below in at most iterations, and hence the quantity can be usefully interpreted as the length of the corresponding segment of the central path.
Crossover events and layered least squares steps
At a very high level, Vavasis and Ye show that the central path can be decomposed into at most short but curved segments, possibly joined by long (apriori unbounded) but very straight segments. At the end of each curved segment, they show that a new ordering relation —called a ‘crossover event’—is implicitly learned, where this relation did not hold at the start of the segment, but will hold at every point from the end of the segment onwards. These relations give a combinatorial way to measure progress along the central path. In contrast to Tardos’s algorithm, where the main progress is setting variables to zero explicitly, the variables participating in crossover events cannot be identified, only their existence is shown.
At a technical level, the VY algorithm is a variant of the Mizuno-Todd-Ye [MTY93] predictor-corrector method (MTY P-C). In predictor-corrector methods, corrector steps bring an iterate closer to the path, i.e., improve centrality, and predictor steps “shoot down” the path, i.e., reduce without losing too much centrality. VY’s main algorithmic innovation was the introduction of a new predictor step, called the ‘layered least squares’ (LLS) step, which crucially allowed them to cross each aforementioned “straight” segment of the central path in a single step, recalling that these straight segments may be arbitrarily long. To traverse the short and curved segments of the path, the standard predictor step, known as affine scaling (AS), in fact suffices.
To compute the LLS direction, the variables are decomposed into ‘layers’ . The goal of such a decomposition is to eventually learn a refinement of the optimal partition of the variables , where and for the limit optimal solution .
The primal affine scaling direction can be equivalently described by solving a weighted least squares problem in , with respect to a weighting defined according to the current iterate. The primal LLS direction is obtained by solving a series of weighted least squares problems, starting with focusing only on the final layer . This solution is gradually extended to the higher layers (which refers to layers with lower indices). The dual directions have analogous interpretations, with the solutions on the layers obtained in the opposite direction, starting with . If we use the two-level layering , , and are sufficiently close to the limit of the central path, then the LLS step reaches an exact optimal solution in a single step. We note that standard AS steps generically never find an exact optimal solution, and thus some form of “LLS rounding” is always necessary to achieve finite termination.
Of course, guessing and correctly is just as hard as solving (LP). Still, if we work with a “good” layering, these will reveal new information about the “optimal order” of the variables, where is placed on higher layers than . The crossover events correspond to swapping two wrongly ordered variables into the correct ordering. Namely, a variable and are currently ordered on the same layer, or is in a higher layer than . After the crossover event, will always be placed on a higher layer than .
Computing good layerings and the condition measure
Given the above discussion, the obvious question is how to come up with “good” layerings? The philosophy behind LLS can be stated as saying that if modifying a set of variables barely affects the variables in (recalling that movement is constrained to ), then one should optimize over without regard to the effect on ; hence should be placed on lower layers.
VY’s strategy for computing such layerings was to directly use the size of the coordinates of the current iterate (where is a point near the central path). In particular, assuming , the layering corresponds to consecutive intervals constructed in decreasing order of values. The break between and occurs if the gap , where is the rightmost element of and is a threshold parameter. Thus, the expectation is that if , then a small multiplicative change to , subject to moving in , should induce a small multiplicative change to . By proximity to the central path, the dual ordering is reversed as mentioned above.
The threshold for which this was justified in VY was a function of the condition measure. We now provide a convenient definition, which immediately yields this justification (see Proposition 2.4). Letting and , we define as the minimum number such that for any and , there exists with and . Thus, a change of in variables in can be lifted to a change of at most in variables in . Crucially, is a “self-dual” quantity. That is, , where is the movement subspace for the dual problem, justifying the reversed layering for the dual (see Sections 2 for more details).
The question of scale invariance and
While the VY layering procedure is powerful, its properties are somewhat mismatched with those of the central path. In particular, variable ordering information has no intrinsic meaning on the central path, as the path itself is scaling invariant. Namely the central path point w.r.t. the problem instance is in bijective correspondence with the central path point w.r.t. the problem instance for any positive definite diagonal matrix . The standard path following algorithms are also scaling invariant in this sense.
This lead Monteiro and Tsuchiya [MT03] to ask whether a scaling invariant LLS algorithm exists. They noted that any such algorithm would then depend on the potentially much smaller parameter
| (1) |
where the infimum is taken over the set of positive diagonal matrices. Thus, Monteiro and Tsuchiya’s question can be rephrased as to whether there exists an exact LP algorithm with running time poly.
Substantial progress on this question was made in the followup works [MT05, LMT09]. [MT05] showed that the number of iterations of the MTY predictor-corrector algorithm [MTY93] can get from to on the central path in iterations. This is attained by showing that the standard AS steps are reasonably close to the LLS steps. This proximity can be used to show that the AS steps can traverse the curved parts of the central path in the same iteration complexity bound as the VY algorithm. Moreover, on the “straight” parts of the path, the rate of progress amplifies geometrically, thus attaining a convergence on these parts. Subsequently [LMT09] developed an affine invariant trust region step, which traverses the full path in iterations. However, each iteration is weakly polynomial in and . The question of developing an LP algorithm with complexity bound poly thus remained open.
A related open problem to the above is whether it is possible to compute a near-optimal rescaling for program (1)? This would give an alternate pathway to the desired LP algorithm by simply preprocessing the matrix . The related question of approximating was already studied by Tunçel [Tun99], who showed NP-hardness for approximating to within a factor. Taken at face value, this may seem to suggest that approximating the rescaling should be hard.
Our contributions.
In this work, we resolve all of the above questions in the affirmative. We detail our contributions below.
1. Finding an approximately optimal rescaling. As our first contribution, we give an time algorithm which works on the linear matroid of to compute a diagonal rescaling matrix which achieves , given any matrix . Furthermore, this same algorithm allows us to approximate to within a factor . The algorithm bypasses Tunçel’s hardness result by allowing the approximation factor to depend on itself, namely on . This gives a simple first answer to Monteiro and Tsuchiya’s question: by applying the Vavasis-Ye algorithm directly on the preprocessed matrix, we may solve any LP with constraint matrix using iterations. Note that the approximation factor increases the runtime only by a constant factor.
To achieve this result, we work directly with the circuits of , where a circuit is for a minimal linear dependency . With each circuit, we can associate a vector with that is unique up to scaling. By the ‘circuit ratio’ of , we mean the largest ratio taken over every circuit of such that . As our first observation, we show that the maximum of all circuit ratios, which we call the ‘circuit imbalance measure’, in fact characterizes up to a factor . This measure was first studied by Vavasis [Vav94], who showed that it lower bounds , though, as far as we are aware, our upper bound is new. The circuit ratios of each pair induces a weighted directed graph we call the circuit ratio digraph of . From here, our main result is that is up to a factor equal to the maximum geometric mean cycle in the circuit ratio digraph. Our approximation algorithm populates the circuit ratio digraph with ratios for each using basic matroid techniques, and then computes a rescaling by solving the dual of the maximum geometric mean ratio cycle on the ‘approximate circuit ratio digraph’.
2. Scaling invariant LLS algorithm. While the above yields an LP algorithm with poly running time, it does not satisfactorily address Monteiro and Tsuchiya’s question for a scaling invariant algorithm. As our second contribution, we use the circuit ratio digraph directly to give a natural scaling invariant LLS layering algorithm together with a scaling invariant crossover analysis.
At a conceptual level, we show that the circuit ratios give a scale invariant way to measure whether ‘’ and enable a natural layering algorithm. Let be the circuit imbalance between and , i.e., the maximum value for a minimal kernel solution containing and in the support. Given the circuit ratio graph induced by and a primal point near the path, our layering algorithm can be described as follows. We first rescale the variables so that becomes the all ones vector, which rescales to . We then restrict the graph its edges of length —the long edges of the (rescaled) circuit ratio graph—and let the layering be a topological ordering of its strongly connected components (SCC) with edges going from left to right. Intuitively, variables that “affect each other” should be in the same layer, which motivates the SCC definition. It is not hard to see that VY’s original layering algorithm can achieved in this way by artificially declaring the rescaled circuit ratios to be , .
We note that our layering algorithm does not in fact have access to the true circuit ratios , as these are NP-hard to compute. Getting a good enough initial estimate for our purposes however is easy: we let be the ratio corresponding to an arbitrary circuit containing and . This already turns out to be within a factor from the true value , which we recall is the maximum over all such circuits. Our layering algorithm in fact learns better circuit ratio estimates if the “lifting costs” of our SCC layering, i.e., how much it costs to lift changes from lower layer variables to higher layers (as in the definition of ), are larger than we expected them to be.
For our analysis, we define cross-overs in a scaling invariant way as follows. Before the crossover event, , and after the crossover event, for all further central path points. Our analysis relies on in only a minimalistic way, and does not require an estimate on the value of . Namely, it is only used to show that if , fo a layer , then the rescaled circuit ratio is in the range . The argument to show this crucially utilizes the maximum geometric mean cycle characterization. Furthermore, unlike prior analyses [VY96, MT03], our definition of a “good” layering (i.e., ‘balanced’ layerings, see Section 3.4), is completely independent of .
3. Improved potential analysis. As our third contribution, we improve the Vavasis-Ye crossover analysis using a new and simple potential function based approach. When applied to our new LLS algorithm, we derive an iteration bound for path following, improving the polynomial term by an factor compared to the VY analysis.
Our potential function can be seen as a fine-grained version of the crossover events as described above. In case of such a crossover event, it is guaranteed that in every subsequent iteration, is in a layer before . Instead, we analyze less radical changes: an “event” parametrized by means that and are currently together on a layer of size , and after the event, is on a layer before , or if they are together on the same layer, then this layer must have size . For every LLS step, we can find a parameter such that an event of this type happens concurrently for at least pairs within the next iterations,
Our improved analysis is also applicable to the original VY algorithm. Let us now comment on the relation between the VY algorithm and our new algorithm. The VY algorithm starts a new layer once between two consecutive variables where the permutation is a non-increasing order of the variables. Here, . Setting the initial ‘estimates’ for a suitable polynomial, our algorithm runs the same way as the VY algorithm. Using these estimates, the layering procedure becomes much simpler: there is no need to verify ‘balancedness’ as in our general algorithm.
However, setting has drawbacks. Most importantly, it does not give a lower bound on the true circuit ratio —to the contrary, will be an upper bound! In effect, this causes VY’s layers to be “much larger” than ours, and for this reason, the connection to is lost. Nevertheless, our potential function analysis can still be adapted to the VY algorithm to obtain the same improvement on the iteration complexity bound; see Section 4.1 for more details.
1.1 Related work
Since the seminal works of Karmarkar [Kar84] and Renegar [Ren88], there has been a tremendous amount of work on speeding up and improving interior-point methods. In contrast to the present work, the focus of these works has mostly been to improve complexity of approximately solving LPs. Progress has taken many forms, such as the development of novel barrier methods, such Vaidya’s volumetric barrier [Vai89] and the recent entropic barrier of Bubeck and Eldan [BE14] and the weighted log-barrier of Lee and Sidford [LS14, LS19], together with new path following techniques, such as the predictor-corrector framework [Meh92, MTY93], as well as advances in fast linear system solving [ST04, LS15]. For this last line, there has been substantial progress in improving IPM by amortizing the cost of the iterative updates, and working with approximate computations, see e.g. [Ren88, Vai89, CLS19, vdB20]. Very recently, Cohen, Lee and Song [CLS19] developed a new inverse maintenance scheme to get a randomized -time algorithm for -approximate LP, which was derandomized by van den Brand [vdB20]. For special classes of LP such as network flow problems, fast algorithms have been obtained by using fast Laplacian solvers, see e.g. [DS08, Mad13]. Given the progress above, we believe it to be an interesting problem to understand to what extent these new numerical techniques can be applied to speed up LLS computations, though we expect that such computations will require very high precision. We note that no attempt has been made in the present work to optimize the complexity of the linear algebra.
Ho and Tunçel [HT02] showed how to extend Tardos’ framework to the real model of computation (i.e., to non-integral ), providing a blackbox alternative to the VY algorithm. The numerical complexity of the LPs arising in their reduction is controlled by the minimum and maximum subdeterminant of restricted to non-singular submatrices and the minimum non-zero slack of any basic primal or dual solution over a certain grid of right hand sides and objectives.
With regard to LLS algorithms, the original VY algorithm required explicit knowledge of to implement their layering algorithm. [MMT98] showed that this could be avoided by computing all LLS steps associated with candidate partitions and picking the best one. In particular, they showed that all such LLS steps can be computed in time. [MT03] gave an alternate approach which computes a LLS partition directly from the coefficients of the AS step. We note that these methods crucially rely on the variable ordering, and hence are not scaling invariant. Kitahara and Tsuchiya [KT13], gave a 2-layer LLS step which achieves a running time depending only on and right-hand side , but with no dependence on the objective, assuming the primal feasible region is bounded.
A series of papers have studied the central path from a differential geometry perspective. Monteiro and Tsuchiya [MT08] showed that a curvature integral of the central path, first introduced by Sonnevend, Stoer, and Zhao [SSZ91], is in fact upper bounded by . This has been extended to SDP and symmetric cone programming [KOT14], and also studied in the context of information geometry [KOT13].
Circuits have appeared in several papers on linear and integer optimization (see [DLKS19] and its references). The idea of using circuits within the context of LP algorithms also appears in [DLHL15]. They develop an augmentation framework for LP (as well ILP) and show that a simplex-like algorithm which takes steps according to the “best circuit” direction achieves linear convergence, though these steps are hard to compute.
Our algorithm makes progress towards strongly polynomial solvability of LP, by improving the dependence poly to poly. However, in a remarkable recent paper, Allamigeon et al. [ABGJ18] have shown, using tools from tropical geometry, that path-following methods for the standard logarithmic barrier cannot be strongly polynomial. In particular, they give a parametrized family of instances, where, for sufficiently large parameter values, any sequence of iterations following the central path must be of exponential length—thus, will be doubly exponential. We note that it is unclear whether their instance is robust to changing the barrier method itself; e.g., the weighted log-barrier [LS14].
1.2 Organization
Section 2 begins with the necessary background on the condition measures and . It culminates in the approximate rescaling and approximation algorithm in Section 2.4. This algorithm relies upon the circuit imbalance measure in Section 2.1, the min-max characterization in Section 2.2, and a circuit finding algorithm in Section 2.3.
In Section 3, we develop our scaling invariant interior-point method. Interior-point preliminaries are given in Section 3.1, the layered least squares step is explained in Section 3.3, our scaling invariant layering algorithm is given in Section 3.4, and lastly, our overall algorithm is given in Section 3.5.
In Section 4, we give the potential function proof for the improved iteration bound, relying on technical lemmas. The full proof of these lemmas is deferred to Section 6; however, Section 4 provides the high-level ideas to each proof. Section 4.1 shows that our argument also leads to a factor improvement in the iteration complexity bound of the VY algorithm.
2 Finding an approximately optimal rescaling
Notation
Our notation will largely follow [MT03, MT05]. We let denote the set of positive reals, and the set of nonnegative reals. For , we let . Let denote the th unit vector, and the all 1s vector. For a vector , we let denote the diagonal matrix with on the diagonal. We let denote the set of all positive definite diagonal matrices. For , we use the notation to denote . The scalar product of the two vectors is denoted as . For , we also use the notation to denote the vector . Similarly, for , we let denote the vector . We denote the support of a vector by .
For an index subset , we use for the coordinate projection. That is, , and for a subset , . We let .
For a matrix , and we let denote the submatrix of restricted to the set of rows in and columns in . We also use and . We let denote the pseudo-inverse of .
We let denote the kernel of the matrix . Throughout, we assume that the matrix in (LP) has full row rank, and that .
Subspace formulation
Throughout the paper, we let denote the kernel of the matrix . Using this notation, (LP) can be written in the form
| (2) |
where satisfies .
The condition number
The condition number is defined as
| (3) | ||||
This condition number was first studied by Dikin [Dik67], Stewart [Ste89], and Todd [Tod90], among others, and plays a key role in the analysis of the Vavasis-Ye interior point method [VY96]. There is an extensive literature on the properties and applications of , as well as its relations to other condition numbers. We refer the reader to the papers [HT02, MT03, VY96] for further results and references.
It is important to note that only depends on the subspace . Hence, we can also write for a subspace , defined to be equal to for some matrix with . We will use the notations and interchangeably.
The next lemma summarizes some important known properties of .
Proposition 2.1.
Let with full row rank and .
-
If the entries of are all integers, then is bounded by , where is the input bit length of .
-
.
-
Let the columns of form an orthonormal basis of . Then
-
.
Proof.
In Proposition 3.8, we will also give another proof of d. We now define the lifting map, a key operation in this paper, and explain its connection to .
Definition 2.2.
Let us define the lifting map by
Note that is the unique linear map from to such that and is orthogonal to .
Lemma 2.3.
Let be an -dimensional linear subspace. Let the columns of denote an orthonormal basis of . Then, viewing as a matrix,
Proof.
If , then for some . By the well-known property of the pseudo-inverse, . This solution satisfies , . Since the columns of form form an orthonormal basis of , we have that . Consequently, is the minimum-norm point with the above properties. ∎
The above lemma and Proposition 2.1c yield the following characterization. This will be the most suitable characterization of for our purposes.
Proposition 2.4.
For a linear subspace ,
The following notation will be convenient for our algorithm. For a subspace and an index set , if , we define the lifting score
| (4) |
Otherwise, we define . This means that for any and , .
The condition number
For every , we can consider the condition number . We let
denote the best possible value of that can be attained by rescaling the coordinates of . The main result of this section is the following theorem.
Theorem 2.5.
There is an time algorithm that for any matrix computes a such that
and a such that
2.1 The circuit imbalance measure
We next introduce the circuit imbalance measure, a more combinatorial condition number, and show that it gives a good proxy to .
Definition 2.6.
For a linear subspace and a matrix such that , a circuit is an inclusion-wise minimal dependent set of columns of . Equivalently, a circuit is a set such that is one-dimensional and that no strict subset of has this property. Any circuit is associated with a vector with inclusion-wise minimal support. The set of circuits of is denoted .
Note that these are also known as the circuits in the linear matroid associated with .
Definition 2.7.
For a circuit , let be such that . For , we let
| (5) |
For any , we define the circuit ratio as the maximum of over all choices of the circuit :
| (6) |
By convention we set if there is no circuit supporting and . Further, we define the circuit imbalance measure as
Minimizing over all coordinate rescalings, we define
We omit the index whenever it is clear from context. In such cases, for , we write and .
We want to remark that a priori it is not clear that is well-defined. Theorem 2.13 will show that the minimum of is indeed attained. Observe that does not depend on the choice of , since there is only a single choice up to scalar multiplication.
The circuit ratio, as well as the circuit imbalance measure, are self-dual.
Lemma 2.8.
For any subspace and , .
Proof.
Let and be its corresponding vector such that . We construct a circuit solution in that certifies .
Define by and for all . Then, is orthogonal to by construction, and hence we have .
We will lift the vector to a circuit solution in . The next procedure will be applied first for , and extended one coordinate at a time; we terminate once . Suppose we have any set and vector , such that is a circuit solution . For the starting , the vector is a suitable choice. Pick an arbitrary . We look for a circuit solution with . Let . By our assumption that , we see that and that any satisfies . From here, it is easy to check that and that iff . If , there is a unique such satisfying . For this , we thus have and we are done. If , the choice of is not unique. In this case, since , we may choose so that . For this choice, and . In particular, as needed.
Let denote circuit solution obtained by this procedure. Clearly,
By swapping the role of and and and , we obtain . The statement follows. ∎
The next theorem relates the circuit imbalance and the condition number . The lower bound was already proven in [Vav94], and the upper bound is new, as far as we know.
Theorem 2.9.
For a linear subspace ,
Proof.
For the first inequality, let be the circuit, s.t. for the corresponding solution . Let us use the characterization of in Proposition 2.4. Let , and , that is, the vector with and for . Then, the unique vector such that is . Therefore,
The reverse direction is immediate from Lemma 2.11 below. ∎
The next lemmas complete the proof of Theorem 2.9, and will also be used later in the algorithm. Let us say that the vector is sign-consistent with if for all and implies for all .
Lemma 2.10.
For with , let . Then for any we have .
Proof.
We consider the cone of vectors sign-consistent with . The faces of are bounded by inequalities of the form or . The edges (rays) of are of the form with . It is easy to see from the Minkowski-Weyl theorem that can be written as
where , are circuits, and the vectors are sign-consistent with and for all . Note that for all , as otherwise, would also satisfy , but due to being sign-consistent with , a contradiction to the definition of .
At least one contributes at least as much to as average. Hence we find . ∎
For the next lemma, recall the definition of the lifting score from (4).
Lemma 2.11.
For a linear subspace and an index set , we can efficiently identify and such that .
Proof.
Take any minimal such that . Then we know that and for we can compute . Let be the matrix sending any to the corresponding vector . The column can be computed as for . We have for any , and so . We upper bound the operator norm by the Frobenius norm . By Lemma 2.10 it follows that . ∎
Our LLS algorithm in Section 3 will use the subroutine described in the proof of Lemma 2.11. For a subspace , an index set , and a threshold , the algorithm Verify-Lift() outputs either of the answers ‘pass’ or ’fail’. If the answer is ‘pass’, then it is guaranteed that . If the answer is ‘fail’, then a pair of indices , , and a bound are returned, such that .
To implement the algorithm, we first need to select a minimal such that . This can be found by computing a matrix such that , and selecting a maximal number of linearly independent columns of . Then, we compute the matrix that implements the transformation . The algorithm returns the pair corresponding to the entry maximizing .
Remark 2.12.
We note that the algorithm Verify-Lift does not need to compute the circuit as in Lemma 2.10. The following observation will be important in the analysis: the algorithm returns the answer ‘fail’ even if , but .
2.2 A min-max theorem on
We next provide a combinatorial min-max characterization on . Consider the circuit ratio digraph on the node set where if , that is, there exists a circuit with . An edge is said to have weight . (Note that if and only if , but the weight of these two edges can be different.)
Let be a cycle in , that is, a sequence of points . We use to denote the length of the cycle. (In our terminology, ‘cycles’ always refer to objects in , whereas ‘circuits’ refer to the minimum supports in .)
We use the notation . For a vector , we let for . A simple but important observation is that such a rescaling does not change the value associated with the cycle, that is,
| (7) |
We are ready to formulate our theorem.
Theorem 2.13.
For a subspace , we have
Proof.
For the direction we use (7). Let be a scaling and a cycle. We have for every , and hence . Since this inequality holds for every , it follows that .
For the reverse direction, consider the following optimization problem.
| (8) |
Note that any feasible solution can be rescaled to a feasible solution for with equal objective value. As such, we can strengthen the condition to without changing the objective value. This makes it clear that the optimum value is achieved by a feasible solution.
Any rescaling provides a feasible solution with objective value , which means that the optimal value of (8) is . Moreover, (8) can be written as
| (9) |
This is the dual of a minimum-mean cycle problem with respect to the cost function . Therefore, an optimal solution corresponds to the cycle maximizing , or in other words, maximizing . ∎
The following example shows that can be arbitrarily big.
Example 2.14.
Take , where . Then and are circuits with and . Hence, by Theorem 2.13, we see that .
The following corollary of Theorem 2.13 particularly useful. It asserts that any arbitrary circuit containing and yields a approximation to .
Corollary 2.15.
We are given a linear subspace and , , and a circuit with . Let be the corresponding vector with . Then,
Proof.
The second inequality follows by definition. For the first inequality, note that the same circuit yields . Therefore, .
From Theorem 2.13 we see that , giving , completing the proof. ∎
2.3 Finding circuits: a detour in matroid theory
We now show how to efficiently obtain a family such that for any , includes a circuit containing both and , provided there exists such a circuit.
We need some simple concepts and results from matroid theory. We refer the reader to [Sch03, Chapter 39] or [Fra11, Chapter 5] for definitions and background. Let be a matroid on ground set with independent sets . The rank of a set is the maximum size of an independent set contained in . The maximal independent sets are called bases. All bases have the same cardinality .
For the matrix , we will work with the linear matroid , where a subset is independent if the columns are linearly independent. Note that under the assumption that has full row rank.
The circuits of the matroid are the inclusion-wise minimal non-independent sets. Let be an independent set, and such that . Then, there exists a unique circuit that is called the fundamental circuit of with respect to . Note that .
The matroid is separable, if the ground set can be partitioned to two nonempty subsets such that if and only if . In this case, the matroid is the direct sum of its restrictions to and . In particular, every circuit is fully contained in or in .
For the linear matroid , separability means that . In this case, solving (LP) can be decomposed into two subproblems, restricted to the columns in and in , and .
Hence, we can focus on non-separable matroids. The following characterization is well-known, see e.g. [Fra11, Theorems 5.2.5, 5.2.7–5.2.9]. For a hypergraph , we define the underlying graph such that if there is a hyperedge with . That is, we add a clique corresponding to each hyperedge. The hypergraph is called connected if the underlying graph is connected.
Proposition 2.16.
For a matroid , the following are equivalent:
-
is non-separable.
-
The hypergraph of the circuits is connected.
-
For any base of , the hypergraph formed by the fundamental circuits is connected.
-
For any , there exists a circuit containing and .
Proof.
For the implication b c, assume for a contradiction that the hypergraph of the fundamental circuits with respect to is not connected. This means that we can partition such that for each , , and for each , . Consequently, , , and therefore . It is easy to see that this property is equivalent to separability to and ; see e.g. [Fra11, Theorem 5.2.7] for a proof.
We now give a different proof of c d that will be convenient for our algorithmic purposes. First, we need a simple lemma that is commonly used in matroid optimization, see e.g. [Fra11, Lemma 13.1.11] or [Sch03, Theorem 39.13].
Lemma 2.17.
Let be an independent set of a matroid , and , such that is dependent for each . Further, assume that for each , and for all . Then, .
To prove the lemma, one simply shows that exchanging for maintains independence, after which one applies induction on . Based on this lemma, we show the following exchange property.
Lemma 2.18.
Let be a basis of the matroid , and let , and . Assume , , and for each , . Then contains a unique circuit , and .
The situation described here corresponds to a minimal path in the hypergraph of the fundamental circuits with respect to a basis . The hyperedges form a path from to such that no shortcut is possible (note that this is weaker than requiring a shortest path).
Proof of Lemma 2.18.
Note that since and is a basis. For any , we can use Lemma 2.17 to show that (and thus, is a basis). To see this, we apply Lemma 2.17 for the ordered sets and .
Consequently, every circuit in must contain the entire set . The uniqueness of the circuit in follows by the well-known circuit axiom asserting that if , and , then there exists a circuit such that , contradicting the claim that every circuit in contains the entire set . ∎
We are ready to describe the algorithm that will be used to obtain lower bounds on all values. For a matrix , we let Find-Circuits() denote the subroutine described in the lemma for the linear matroid .
Theorem 2.19.
Given , there exists an time algorithm Find-Circuits() that obtains a decomposition of to a direct sum of non-separable linear matroids, and returns a family of circuits such that if and are in the same non-separable component, then there exists a circuit in containing both and . Further, for each in the same component, the algorithm returns a value as the the maximum of such that , for some containing and . For these values, .
Proof.
Once we have found the set of circuits , and computed as in the statement, the inequalities follow easily. The first inequality is by the definition of , and the second inequality is from Corollary 2.15.
We now turn to the computation of . We first obtain a basis via Gauss-Jordan elimination in time . Recall the assumption that has full row-rank. Let us assume that is the set of first indices. The elimination transforms it to the form , where corresponds to the non-basis elements. In this form, the fundamental circuit is the support of the th column of together with for every . We let denote the set of all these fundamental circuits.
We construct an undirected graph as follows. For each , we add a clique between the nodes in . This graph can be constructed in time.
The connected components of correspond to the connected components of restricted to . Thus, due to the equivalence shown in Proposition 2.16 we can obtain the decomposition by identifying the connected components of . For the rest of the proof, we assume that the entire hypergraph is connected; connectivity can be checked in time.
We initialize as . We will then check all pairs , . If no circuit exists with , then we will add such a circuit to as follows.
Assume first . We can find a shortest path in between the sets and in time . This can be represented by the sequences of points , , , and as in Lemma 2.18. According to the lemma, contains a unique circuit that contains all ’s, including and .
We now show how this circuit can be identified in time, along with the vector . Let be the submatrix corresponding to the columns in . Since is unique up to scaling, we can set . Note that for each , the row of corresponding to contains only two nonzero entries: and . Thus, the value can be propagated to assigning unique values to . Once these values are set, there is a unique extension of to the indices in the basis. Thus, we have identified as the unique element of up to scaling. The circuit is obtained as . Clearly, the above procedure can be implemented in time.
The argument easily extends to finding circuits for the case . If , then for any choice of and as in Lemma 2.18 such that and for , the unique circuit in also contains . This follows from Lemma 2.17 by taking and , which proves that . Similarly, if with and for , taking and gives .
The bottleneck for the running time is finding the shortest paths for the pairs, in time each. ∎
The triangle inequality
An interesting additional fact about the circuit ratio graph is that the logarithm of the weights satisfy the triangle inequality. The proof uses similar arguments as the proof of Theorem 2.19 above.
Lemma 2.20.
-
For any distinct in the same connected component of , and any with , , there exist circuits , , such that .
-
For any distinct in the same connected component of , .
Proof.
Note that part (ii) immediately follows from part (i) when taking such that . We now prove part (i).
Let be a full-rank matrix with . If , then the columns are linearly dependent. Writing , we have . Let be any circuit solution with , and hence . By assumption, the vector will satisfy and have . We know that is a circuit solution, because any circuit could, by the above process in reverse, be used to produce a kernel solution with strictly smaller support than , contradicting the assumption that is a circuit solution. Now we have by construction. Thus, and are the circuit solutions we are looking for.
Now assume . If , the statement is trivially true with , so assume . Pick , and set . Assume without loss of generality that and apply row operations to such that is an identity submatrix and . Then the column has support given by , for otherwise could not be in the kernel. The given circuit solution satisfies for all , and in particular .
Take any minimal set such that and the linear matroid induced by is non-separable. We show that we can uniquely lift any vector to a vector . Since this lift will send circuit solutions to circuit solutions by uniqueness, it suffices to find our desired circuits as solutions to the smaller linear system.
Consider a circuit solution such that , which exists by Proposition 2.16d. Minimality of and Proposition 2.16b imply . This gives rise to the non-zero vector . Moreover, , as and minimality of . This provides us with clear linear relations between any two entries in of any vector in , implying that we can apply row operations to such that has non-zero entries only in the column . Note that these row operations leave unchanged. From this, we can see that any element in can be uniquely lifted to an element in . Hence we can focus on .
If , then any satisfies and, in particular, any circuit contains and fulfills . Choosing concludes the case.
Otherwise, we know that or . Any circuit in can be lifted uniquely to an element in since is an identity matrix and we can set the entries of individually to satisfy the equalities. Note that this lifted vector is a circuit as well. Hence we may restrict our attention to the matrix . If the columns are linearly dependent, then any circuit solution to , such as , is easily transformed into a circuit solution to and we are done.
If are independent, we can write , where . For , which is non-zero since by the independence assumption, we observe that and are the circuits we are looking for. ∎
2.4 Approximating and
Equipped with Theorem 2.13 and Theorem 2.19, we are ready to prove Theorem 2.5, that is, given , find an approximately optimal rescaling of the columns. Recall that we defined when . We can similarly define , and approximates just as in Theorem 2.19.
Proof of Theorem 2.5.
Let us run the algorithm Finding-Circuits described in Theorem 2.19 to obtain the values such that . We let be the circuit ratio digraph, that is, if .
To show the first statement on approximating , we simply set . Then, follows by Theorem 2.9.
For the second statement on finding a nearly optimal rescaling for , we consider the following optimization problem, which is an approximate version of (8) from Theorem 2.13.
| (10) |
Let be an optimal solution to (10) with value . We will prove that .
First, observe that for any . Now, let be such that . The vector is a feasible solution to (10), and so . Hence we find that gives a rescaling with .
Using Theorem 2.9, we show that . Indeed,
We can obtain the optimal value of (10) by solving the corresponding maximum-mean cycle problem (see Theorem 2.13). It is easy to develop a multiplicative version of the standard dynamic programming algorithm of the classical minimum-mean cycle problem (see e.g. [AMO93, Theorem 5.8]) that allows finding the optimum to (10) directly, in the same time.
It is left to find the labels , such that for all . We define the following weighted directed graph. We associate the weight with every , and add an extra source vertex with edges of weight for all .
By the choice of , this graph does not contain any negative weight directed cycles. We can compute the shortest paths from to all nodes in using the Bellman-Ford algorithm; let be the shortest path label for . We then set .
The running time of the whole algorithm will be bounded by . The running time is dominated by the complexity of Finding-Circuits and the complexity of solving the minimum-mean cycle problem and shortest path computation. ∎
3 A scaling-invariant layered least squares interior-point algorithm
3.1 Preliminaries on interior-point methods
In this section, we introduce the standard definitions, concepts and results from the interior-point literature that will be required for our algorithm. We consider an LP problem in the form (LP), or equivalently, in the subspace form (2) for . We let
Recall the central path defined in (CP), with denoting the central path point corresponding to . We let denote the primal and dual optimal solutions to (LP) that correspond to the limit of the central path for .
For a point , the normalized duality gap is .
The -neighborhood of the central path with opening is the set
Throughout the paper, we will assume is chosen from ; in Algorithm 2 we use the value . The following proposition gives a bound on the distance between and if . See e.g. [Gon92, Lemma 5.4].
Proposition 3.1.
Let for and , and consider the central path point . For each ,
We will often use the following immediate corollary.
Corollary 3.2.
Let for , and . Then for each
A key property of the central path is “near monotonicity”, formulated in the following lemma, see [VY96, Lemma 16].
Lemma 3.3.
Let be a central path point for and be a central path point for . Then . Further, for the optimal solution corresponding to the central path limit , we have .
Proof.
We now show and the second statement. For , see the proof of [VY96, Lemma 16]. Since and , we have . This can be rewritten as . The right hand side equals . Dividing by , and noting that for all , we obtain
Hence by . We get the second statement by taking the limit . ∎
3.2 Predictor and corrector steps
Given , the search directions commonly used in interior-point methods are obtained as the solution to the following linear system for some .
| (11) | ||||
| (12) | ||||
| (13) |
Predictor-corrector methods, such as the Mizuno-Todd-Ye Predictor-Corrector (MTY P-C) algorithm [MTY93], alternate between two types of steps. In predictor steps, we use . This direction is also called the affine scaling direction, and will be denoted as throughout. In corrector steps, we use . This gives the centrality direction, denoted as .
In the predictor steps, we make progress along the central path. Given the search direction on the current iterate , the step-length is chosen maximal such that we remain in , i.e.
Thus, we obtain a point . The corrector step finds a next iterate , where is the centrality direction computed at . The next proposition summarizes well-known properties, see e.g. [Ye97, Section 4.5.1].
Proposition 3.4.
Let for .
-
For the affine scaling step, we have .
-
The affine scaling step-length is
-
For , and , we have and .
-
After a sequence of predictor and corrector steps, we obtain an iterate such that .
Minimum norm viewpoint and residuals
From Proposition 3.1, we see that if , and , then for each ,
| (16) |
The matrix will be often used for rescaling in the algorithm. That is, for the current iterate in the interior-point method, we will perform projections in the space . To simplify notation, for , we use and as shorthands for and . The subspace will be fixed throughout.
It is easy to see from the optimality conditions that the components of the affine scaling direction are the optimal solutions of the following minimum-norm problems.
| (17) | ||||
Following [MT05], for a search direction , we define the residuals as
| (18) |
Hence, the primal affine scaling direction is the one that minimizes the -norm of the primal residual , and the dual affine scaling direction minimizes the -norm of the dual residual . The next lemma summarizes simple properties of the residuals, see [MT05].
Lemma 3.5.
For such that and the affine scaling direction , we have
-
(19) -
-
We have , and for each , .
-
Proof.
For a subset , we define
| (20) |
The next claim shows that for the affine scaling direction, a small yields a long step; see [MT05, Lemma 2.5].
Lemma 3.6.
Let for . Then for the affine scaling step, we have
3.3 Layered-least-squares direction
Let be an ordered partition of .111In contrast to how ordered partitions were defined in [MT05], we use the term ordered only to the -tuple , which is to be viewed independently of . For , we use the notations , , and similarly and . We will also refer to the sets as layers, and as a layering. Layers with lower indices will be referred to as ’higher’ layers.
Given , and the layering , the layered-least-squares (LLS) direction is defined as follows. For the primal direction, we proceed backwards, with . Assume the components on the lower layers have already been determined. We define the components in as the coordinate projection , where the affine subspace is defined as the set of minimizers
| (21) |
The dual direction is determined in the forward order of the layers . Assume we already fixed the components on the higher layers. Then, for
| (22) |
The component is obtained as the optimal for the final layer . We use the notation and analogously to the affine scaling direction. This search direction was first introduced in [VY96].
The affine scaling direction is a special case for the single element partition. In this case, the definitions (21) and (22) coincide with those in (17).
3.3.1 A linear system viewpoint
We now present an equivalent definition of the LLS step, generalizing the linear system (12)-(13). We use the subspace notation. With this notation, (12)-(13) for the affine scaling direction can be written as
| (23) |
Recall that (23) is equivalent to .
Given the layering and , for each we define the subspaces
It is easy to see that these two subspaces are orthogonal complements. Analogously to (23), the primal LLS step is obtained as the unique solution to the linear system
| (24) |
and the dual LLS step is the unique solution to
| (25) |
It is important to note that in (24) may be different from , and in (25) may be different from . In fact, and can only be the case for the affine scaling step.
The following lemma proves that the above linear systems are indeed uniquely solved by the LLS step.
Lemma 3.7.
For , , , and , let be defined by
Then is well-defined and
for every .
In the notation of the above lemma we have, for ordered partitions , , and with , that and .
Proof of Lemma 3.7.
We first prove the equality , and by a similar argument we have . By duality, this last equality tells us that
Thus, the linear decomposition defining has a solution and its solution is unique.
Suppose . We prove by induction on , starting at . The induction hypothesis is that , which is an empty requirement when . The hypothesis together with the assumption is equivalent to , and implies . Since we also have by assumption, which is the orthogonal complement of , we must have . Hence, by induction . This finishes the proof that is well-defined.
Next we prove that is a minimizer of . The optimality condition is for to be orthogonal to for any . By the LLS equation, we have , where . Noting then that for , the optimality condition follows immediately. ∎
With these tools, we can prove that the lifting costs are self-dual. This explains the reverse order in the dual vs primal LLS step and justifies our attention on the lifting cost in a self-dual algorithm. The next proposition generalizes the result of [GL97]. Note that it gives a proof of Proposition 2.1d.
Proposition 3.8.
For a linear subspace and index set with ,
In particular, .
Proof.
We first treat the case where or . If then . Furthermore, in this case , and thus . In particular, and . Symmetrically, if then , and .
We now restrict our attention to the case where both . Under this assumption, we show that and thus that . Note that by non-emptyness, we clearly have that .
We formulate a more general claim. Let be linear subspaces such that and . Note that for the orthogonal complements in , we also have , and .
Claim 3.9.
Let be linear subspaces such that and . Thus, for , there are unique decompositions with , and with and . Let be the map sending . Let be the map sending . Then, .
Proof.
To prove the statement, we claim that it suffices to show that if then . To prove sufficiency, note that by symmetry, we also get that if then . Since by assumption, we always have , and thus the equality must hold in all cases. We now assume and show .
Representing as an matrix, we write using a singular value decomposition with . As such, is an orthonormal basis of , since the , and is an orthonormal basis of , since , noting that we have restricted to the singular vectors associated with positive singular values. By assumption, we have that .
The proof is complete by showing that
| (26) |
and that .
The map is a linear projection with . Hence and for all .
We show that can be decomposed as such that and .
The containment is immediate. To show , we need that for all . For , this is true since and . For , we have since and . Consequently, .
We compute , since , and . This verifies (26), and thus . ∎
To prove the lemma, we define , and and let and be as in creftypecap 3.9. By assumption, and . Applying Lemma 3.7, satisfy the conditions of creftypecap 3.9 and . In particular, . Using the fact that and , we similarly get that , where . By (21) we have, for any , that . Thus .
To finish the proof of the lemma from the claim, we show that . By a symmetric argument we get .
If , then because any with must have since is orthogonal to . But and are orthogonal, so because is an orthogonal decomposition.
If , then and hence . Since , we see that . As such, for any , we see that and . For , we thus have that
Since , we must have that is maximized by some . From it is clear that is maximized by some . Now, , so any maximizing satisfies . Therefore, . ∎
3.3.2 Partition lifting scores
A key insight is that if the layering is “well-separated”, then we indeed have , that is, the LLS direction is close to the affine scaling direction. This will be shown in Lemma 3.11. The notion of “well-separatedness” can be formalized as follows. Recall the definition of the lifting score (4). The lifting score of the layering of with respect to is defined as
For , we use and . When the context is clear, we omit and write and .
The following important duality claim asserts that the lifting score of a layering equals the lifting score of the reverse layering in the orthogonal complement subspace. It is an immediate consequence of Proposition 3.8.
Lemma 3.10.
Let be a linear subspace, . For an ordered partition , let denote the reverse ordered partition. Then, we have
Proof.
The next lemma summarizes key properties of the LLS steps, assuming the partition has a small lifting score. The proof is deferred to Section 5.
Lemma 3.11.
Let for , let and . Let be a layering with , and let denote the LLS direction for the layering . Then the following properties hold.
-
We have
(27) (28) -
For the affine scaling direction ,
-
For the residuals of the LLS steps we have . For each , .
-
Let , and define the step length as
We obtain the following bounds on the progress in the LLS step:
-
We have if and only if . These are further equivalent to being an optimal solution to (LP).
3.4 The layering procedure
Our algorithm performs LLS steps on a layering with a low lifting score. A further requirement is that within each layer, the circuit imbalances defined in (6) are suitably bounded. The rescaling here is with respect to for the current iterate . To define the precise requirement on the layering, we first introduce an auxiliary graph. Throughout we use the parameter
| (29) |
0.7inline,
backgroundcolor=black!30!white
, size=inline,
backgroundcolor=black!30!white
, size=todo: inline,
backgroundcolor=black!30!white
, size=Comment[BN1]:
We might not need the exponent anymore. I have to check.
The auxiliary graph
For a vector and , we define the directed graph such that if . This is a subgraph of the circuit ratio digraph studied in Section 2, including only the edges where the circuit ratio is at least the threshold . Note that we do not have direct access to this graph, as we cannot efficiently compute the values .
At the beginning of the entire algorithm, we run the subroutine Find-Circuits() as in Theorem 2.19, where . We assume the matroid is non-separable. For a separable matroid, we can solve the subproblems of our LP on the components separately. Thus, for each , , we obtain an estimate . These estimates will be gradually improved throughout the algorithm.
Note that and . If , then we are guaranteed .
Definition 3.12.
Define to be the directed graph with edges such that ; clearly, is a subgraph of .
Lemma 3.13.
Let . For every , , . Consequently, for any , at least one of or .
Proof.
Recall the definition of from Theorem 2.19. This is defined as the maximum of such that , for some containing and . For the same vector , we get . Consequently, , and also . The second claim follows by the assumption . ∎
Balanced layerings
We are ready to define the requirements on the layering in the algorithm. In the algorithm, will correspond to the scaling of the current iterate .
Definition 3.14.
Let . The layering of is -balanced if
-
, and
-
is strongly connected in for all .
The following lemma shows that within each layer, the values are within a bounded range. This will play an important role in our potential analysis.
Lemma 3.15.
Let and , and , .
-
If the graph contains a directed path of at most edges from to , then
-
If contains a directed path of at most edges from to , then
Proof.
For part a, let be a path in in from to with . That is, for each . Theorem 2.13 yields
since and . Part b follows using part a for and , and that according to Lemma 3.13. ∎
Description of the layering subroutine
Consider an iterate of the algorithm with , The subroutine Layering, described in Algorithm 1, constructs a -balanced layering. We recall that the approximated auxilliary graph with respect to is as in Definition 3.12
We now give an overview of the subroutine Layering. We start by computing the strongly connected components (SCCs) of the directed graph . The edges of this graph are obtained using the current estimates . According to Lemma 3.13, we have or for every , . Hence, there is a linear ordering of the components such that whenever , , and . We call this the ordering imposed by .
Next, for each , we use the subroutine Verify-Lift described after Lemma 2.11. If the subroutine returns ‘pass’, then we conclude , and proceed to the next layer. If the answer is ‘fail’, then the subroutine returns as certificates , , and such that . In this case, we update to the higher value . We add to an edge set ; this edge set was initialized to contain . After adding , all components between those containing and will be merged into a single strongly connected component. To see this, recall that if and for , then according to Lemma 3.13.
Finally, we compute the strongly connected components of . We let denote their unique acyclic order, and return these layers.
Lemma 3.16.
The subroutine Layering returns a -balanced layering in time.
The difficult part of the proof is showing the running time bound. We note that the weaker bound can be obtained by a simpler argument.
Proof.
We first verify that the output layering is indeed -balanced. For property a of Definition 3.14, note that each component is the union of some of the ’s. In particular, for every , the set for some . Assume now . At step of the main cycle, the subroutine Verify-Lift returned the answer ‘fail’, and a new edge was added with , . Note that we already had , since for some , and for . This contradicts the choice of as a maximal strongly connected component in .
Property b follows since all new edges added to have . Therefore, is a subgraph of .
Let us now turn to the computational cost. The initial strongly-connected components can be obtained in time , and the same bound holds for the computation of the final components. (The latter can be also done in linear time, exploiting the special structure that the components have a complete linear ordering.)
The second computational bottleneck is the subroutine Verify-Lift. We assume a matrix is computed at the very beginning such that . We first explain how to implement one call to Verify-Lift in time. We then sketch how to amortize the work across the different calls to Verify-Lift, using the nested structure of the layering, to implement the whole procedure in time. To turn this into , we recall that the layering procedure is the same for and due to duality (Proposition 3.8). Since , applying this subroutine on instead of achieves the same result but in time .
We now explain the implementation of Verify-Lift, where we are given as input and the basis matrix as above with . Clearly, the running time is dominated by the computation of the set and the matrix satisfying , for . We explain how to compute and from using column operations (note that these preserve the range). The valid choices for are in correspondence with maximal sets of linear independent rows of , noting then that where . Let and . By applying columns operations to , we can compute such that ( identity) and . This requires time using Gaussian elimination. At this point, note that , and . To compute , we must transform the columns of into minimum norm lifts of into , for all . For this purpose, it suffices to make the columns of orthogonal to the range of . Applying Gram-Schmidt orthogonalization, this requires time. From here, the desired matrix . Thus, the total running time of Verify-Lift is .
We now sketch how to amortize the work of all the calls of Verify-Lift during the layering algorithm, to achieve a total running time. Let denote the candidate SCC layering. Our task is to compute the matrices , , needed in the calls to Verify-Lift on , , in total time. We achieve this in three steps working with the basis matrix as above. Firstly, by applying column operations to , we compute sets and , , such that , where , and , . Note that this enforces . This computation requires time using Gaussian elimination. This computation achieves , and , for all .
From here, we block orthogonalize , such that the columns of are orthogonal to the range of , . Applying an appropriately adapted Gram-Schmidt orthogonalization, this requires time. Note that this operation maintains , , since . At this point, for the columns of are in correspondence with minimum norm lifts of into , for all . Note that to compute the matrix we need the lifts of , for all instead of just .
We now compute the matrices in this order via the following iterative procedure. Let denote the iteration counter, which decrements from to . For (first iteration), we let and decrement . For , we eliminate the entries of by using the columns of . We then let and decrement . To justify correctness, one need only notice that at the end of iteration , we maintain the orthogonality of to the range of and that is the appropriate identity. The cost of this procedure is the same as a full run of Gaussian elimination and thus is bounded by . The calls to Verify-Lift during the layering procedure can thus be executed in amortized time as claimed. ∎
3.5 The overall algorithm
Algorithm 2 presents the overall algorithm LP-Solve. We assume that an initial feasible solution is given. We address this in Section 7, by adapting the extended system used in [VY96]. We note that this subroutine requires an upper bound on . Since computing is hard, we can implement it by a doubling search on , as explained in Section 7. Other than for initialization, the algorithm does not require an estimate on .
The algorithm starts with the subroutine Find-Circuits as in Theorem 2.19. The iterations are similar to the MTY Predictor-Corrector algorithm [MTY93]. The main difference is that certain affine scaling steps are replaced by LLS steps. In every predictor step, we compute the affine scaling direction, and consider the quantity . If this is above the threshold , then we perform the affine scaling step. However, in case , we use the LLS direction instead. In each such iteration, we call the subroutine Layering() (Algorithm 1) to compute the layers, and we compute the LLS step for this layering.
Another important difference is that the algorithm does not require a final rounding step. It terminates with the exact optimal solution once a predictor step is able to perform a full step with .
Theorem 3.17.
Remark 3.18.
Whereas using LLS steps enables us to give a strong bound on the total number of iterations, finding LLS directions has a significant computational overhead as compared to finding affine scaling directions. The layering can be computed in time (Lemma 3.16), and the LLS steps also require time, see [VY96, MMT98]. This is in contrast to the computational cost of an affine scaling direction. Here is the matrix multiplication constant [VW12].
We now sketch a possible approach to amortize the computational cost of the LLS steps over the sequence of affine scaling steps. It was shown in [MT05] that for the MTY P-C algorithm, the “bad” scenario between two crossover events amounts to a series of affine scaling steps where the progress in increases exponentially from every iteration to the next. This corresponds to the term in their running time analysis. Roughly speaking, such a sequence of affine scaling steps indicates that an LLS step is necessary.
Hence, we could observe these accelerating sequences of affine scaling steps, and perform an LLS step after we see a sequence of length . The progress made by these affine scaling steps offsets the cost of computing the LLS direction.
4 The potential function and the overall analysis
Let and correspond to the point on the central path. For , , we define
and the main potentials in the algorithm as
The quantity is motivated by the bounds in Lemma 3.15. The next statement is an immediate consequence of this lemma and (16).
Lemma 4.1.
Let for , let , and . Let , . If the graph contains a path from to of at most edges, then . If there is a path of at most edges from to , then .
If , then and cannot be together on a layer of size , and cannot be on a layer preceding the layer containing in any -balanced layering, where with .
Our potentials can be seen as fine-grained analogues of the crossover events analyzed in [VY96, MT03, MT05]. Roughly speaking, a crossover event corresponds to increasing above , meaning that and cannot be contained in the same layer after the normalized duality gap decreases below .
In what follows, we formulate four lemmas and use them to prove Theorem 3.17. For the lemmas, we only highlight some key ideas here, and defer the full proofs to Section 6.
For a solution , refers to the LLS direction found in the algorithm, and and denote the residuals as in (18). For a subset recall the definition
Another important quantity in the analysis is
for a subset . For a layering , we let
The key idea of the analysis is to extract information about the optimal solution from the LLS direction. The first main lemma shows that if is large on some layer , then for at least one index , ; the analogous statement holds for .
Lemma 4.2.
Let for , and let be a -balanced layering, and let be the corresponding LLS direction. Then the following statement holds for every :
-
There exists such that
(30) -
There exists such that
(31)
We outline the main idea of the proof of part a; part b follows analogously using the duality of the lifting scores (Lemma 3.10). On layer , the LLS step minimizes , subject to . By making use of the assumption , we can show the existence of a point such that is small, and . By the choice of , we have . Therefore, cannot be much smaller than . Noting that , we obtain a lower bound on for some .
We emphasize that the lemma only shows the existence of such indices and , but does not provide an efficient algorithm for identifying them. It is also useful to note that for any , according to Lemma 3.11c. Thus, for each , we obtain a positive lower bound either in case a or in case b.
The next lemma shows how we can argue for increase in the potential function value for multiple pairs of variables, if we have lower bounds on both and for some , along with a lower bound on .
Lemma 4.3.
Let for , let and . Let and such that for the optimal solution , we have and , and assume . Let be the normalized duality gap after iterations subsequent to the iterate . Then , and for every , either , or .
We note that and as in the lemma are necessarily different, since would imply .
Let us illustrate the idea of the proof of . For and as in the lemma, and for a central path element for , we have and by the near-monotonicity of the central path (Lemma 3.3). Note that
where the last inequality uses Corollary 3.2. Consequently, as sufficiently decreases, will become much larger than . The claim on can be shown by using the triangle inequality shown in Lemma 2.20.
Assume now for some in the LLS step. Then, Lemma 4.2 guarantees the existence of such that . Further, Lemma 4.1 gives . Hence, Lemma 4.3 is applicable for and with .
The overall potential argument in the proof of Theorem 3.17 uses Lemma 4.3 in three cases: (Lemma 4.2 is applicable as above); and (Lemma 4.4); and and (Lemma 4.5). Here, refers to the value of after the LLS step. Note that is well-defined, unless the algorithm terminated with an optimal solution.
To prove these lemmas, we need to study how the layers “move” during the LLS step. We let and . The assumption means that for each layer , either or ; we accordingly refer to -layers and -layers.
Lemma 4.4.
Let for , and let be a -balanced partition. Assume that , and let be the next iterate obtained by the LLS step with and assume . Let such that . If , then there exist such that and . Further, for any , we have .
For the proof sketch, without loss of generality, let , that is, is an -layer. The case can be treated analogously. Since the residuals and cannot be both small, Lemma 4.2 readily provides a such that . Using LABEL:lem:central_path_bounded_l1_norm, .
The key ideas of showing the existence of an such that are the following. With , and , we write equalities and inequalities that hold up to constant and small polynomial factors. First, we show that (i) , and then, that (ii)
If we can show (i) and (ii) as above, we obtain that , and thus, for some .
Let us now sketch the first step. By the assumption , one can show , and therefore
The second part of the proof, namely, lower bounding , is more difficult. We only show it now for the special case when . That is, we have a single layer only; in particular, the LLS step is the same as the affine scaling step . The general case of multiple layers follows by making use of Lemma 3.11. In particular, for a sufficiently small , the LLS step is close to the affine scaling step.
Hence, assume that . Using the equivalent definition of the affine scaling step (17) as a minimum-norm point, we have . From Lemma 3.6, . Thus, we see that .
The final statement on lower bounding for any follows by showing that remains close to , and hence the values of and are sufficiently close for indices on the same layer (Lemma 6.1).
Lemma 4.5.
Let for , and let be a -balanced partition. Assume that , and let be the next iterate obtained by the LLS step with and assume . If , then there exist two layers and , and and , , and . Further, , and for each , .
Consider now any . Then, since is very close to 1, ; on the other hand will “shoot down” close near the small value . Conversely, for , , and will “shoot down” to a small value.
The key step of the analysis is showing that the increase in can be attributed to an -layer “crashing into” a -layer . That is, we show the existence of an edge for and , where and , . This can be achieved by analyzing the matrix used in the subroutine Verify-Lift.
For the layers and , we can use Lemma 4.2 to show that there exists an where is lower bounded, and there exists a where is lower bounded. The lower bound on and the upper bounds on the values can be shown by tracking the changes between the and values, and applying Lemma 4.1 both at and at .
Proof of Theorem 3.17.
We analyze the overall potential function . By definition, , and if then . By the iteration at we mean the iteration where the normalized duality gap of the current iterate is .
If at the end of an iteration, the algorithm terminates with an optimal solution. Recall from Lemma 3.11e that this happens if and only if at a certain iteration.
From now on, assume that . We distinguish three cases at each iteration. These cases are well-defined even at iterations where affine scaling steps are used. At such iterations, still refers to the LLS residuals, even if these have not been computed by the algorithm. (Case I) ; (Case II) and ; and (Case III) and .
Recall that the algorithm uses an LLS direction instead of the affine scaling direction whenever . Consider now the case when an affine scaling direction is used, that is, . According to Lemma 3.11b, . This implies that . Therefore, in cases II and III, an LLS step will be performed.
Starting with any given iteration, in each case we will identify a set of indices with , and start a phase of iterations (that can be either affine scaling or LLS steps). In each phase, we will guarantee that increases by at least . As we can decompose the total sequence of iterations into disjoint phases, this yields the bound on the total number of iterations.
We now consider each of the cases. We always let denote the normalized duality gap at the current iteration, and we let be the layer such that .
Case I: .
LABEL:lem:lower_bounds_for_crossover_events guarantees the existence of such that . Further, according to Lemma 4.1, . Thus, Lemma 4.3 is applicable for . The phase starting at comprises iterations, after which we get a normalized duality gap such that , and for each , either , or .
We can take advantage of these bounds for indices . Again by Lemma 4.1, for any , we have . Thus, there are at least pairs of indices for which increases by at least between iterations at and .
We note that this analysis works regardless if an LLS step or an affine scaling step was performed in the iteration at .
Case II: and .
As explained above, in this case we perform an LLS step in the iteration at , and we let denote the iterate obtained by the LLS step. For , Lemma 4.4 guarantees the existence of such that , and further, . We can therefore apply Lemma 4.3. The phase starting at includes the LLS step leading to (and the subsequent centering step), and the additional iterations as in Lemma 4.3. As in Case I, we get the desired potential increase compared to the potentials at in layer .
Case III: and .
Again, the iteration at will use an LLS step. We apply Lemma 4.5, and set as in the lemma. The argument is the same as in Case II, using that LABEL:lem:case_layer_crashes explicitly states that for any . ∎
4.1 The iteration complexity bound for the Vavasis-Ye algorithm
We now show that the potential analysis described above also gives an improved bound for the original VY algorithm [VY96].
We recall the VY layering step. Order the variables via such that . The layers will be consecutive sets in the ordering; a new layer starts with each time , for a parameter .
As outlined in the Introduction, the VY algorithm can be seen as a special implementation of our algorithm by setting . With these edge weights, we have that precisely if .222For simplicity, in the Introduction we used instead, which is almost the same in the proximity in the central path.
With these edge weights, it is easy to see that our Layering() subroutine finds the exact same components as VY. Moreover, the layers will be the initial strongly connected components of : due to the choice of , this partition is automatically -balanced. There is no need to call Verify-Lift.
The essential difference compared to our algorithm is that the values are not lower bounds on as we require, but upper bounds instead. This is convenient to simplify the construction of the layering. On the negative side, the strongly connected components of may not anymore be strongly connected in . Hence, we cannot use Lemma 4.1, and consequently, Lemma 4.3 does not hold.
Still, the bounds are overestimating by at most a factor poly. Therefore, the strongly connected components of are strongly connected in for some .
Hence, the entire argument described in this section is applicable to the VY algorithm, with a different potential function defined with instead of . This is the reason why the iteration bound in Lemma 4.3, and therefore in Theorem 3.17, also changes to dependency.
It is worth noting that due to the overestimation of the values, the VY algorithm uses a coarser layering than our algorithm. Our algorithm splits up the VY layers into smaller parts so that remains small, but within each part, the gaps between the variables are bounded as a function of instead of .
5 Properties of the layered least square step
We now prove Lemma 3.11, showing that for a layering with small enough , the LLS step approximately satisfies (13), that is, . This also enables us to derive bounds on the norm of the residuals and on the step-length. We start by proving a few auxiliary technical claims. The next simple lemma allows us to take advantage of low lifting scores in the layering.
Lemma 5.1.
Let be two vectors such that . Let , and . Then there exists a vector such that , , and
Proof.
We let
The claim follows by the definition of the lifting score . ∎
The next lemma will be the key tool to prove Lemma 3.11. It is helpful to recall the characterization of the LLS step in Section 3.3.1.
Lemma 5.2.
Proof.
Throughout, we use the shorthand notation . We construct ; one can obtain , using that the reverse layering has lifting score in according to Lemma 3.10.
We proceed by induction, constructing for . This will be given as for a vector such that . We prove the inductive hypothesis
| (36) |
Note that (34) follows by restricting the norm on the LHS to and since the sum on the RHS is .
For , the RHS is 0. We simply set , that is, , trivially satisfying the hypothesis. Consider now , and assume that we have a satisfying (36) for . From (32) and the induction hypothesis, we get that
using also that , Proposition 3.1, and the assumptions , . Note that and are orthogonal vectors. The above inequality therefore implies
Let us now use Lemma 5.1 to obtain for , , and . That is, we get , , and
Together with the induction hypothesis (36) for , we obtain the induction hypothesis for .
∎
Proof of Lemma 3.11.
Again, we use .
Part b.
Part c.
Part d.
Let . We need to find the largest value such that . To begin, we first show that normalized duality gap for any . For this purpose, we use the decomposition:
| (38) |
Recall from Part a that there exists and as in (32) and (33) such that and . In particular, and . Noting that and , taking the average of the coordinates on both sides of (38), we get that
| (39) |
as needed.
Let . To obtain the desired lower bound on the step-length, given (39) it suffices to show that for all that
| (40) |
We will need a bound on the product of the LLS residuals:
| (41) | ||||
using Proposition 3.1, part a, and the assumptions , . Another useful bound will be
| (42) | ||||
The last inequality uses part b. We are ready to get a bound as in (40).
This value is whenever , as needed.
Part e.
From part d, it is immediate that implies . If , we have that is the limit of (strictly) feasible solutions to (LP) and thus is also a feasible solution. Optimality of now follows from Part d, since implies . The remaining implication is that if is optimal, then . Recall that and . The optimality of means that for each , either or . Therefore, . ∎
6 Proof of the main lemmas for the potential analysis
Proof of Lemma 4.2.
We prove part (i); part (ii) follows analogously using Lemma 3.10. Let be the vector obtained from Lemma 5.1, for , , and such that and
Restricting to the components in , and dividing by , we get
| (43) |
Since , from Proposition 3.1 and (16) we see that
and therefore
where the last inequality follows by Lemma 3.3.
Using the above bounds with (43), along with from Lemma 3.11c, we get
using that and . Note that is a feasible solution to the least-squared problem optimally solved for layer by , that is,
It follows that
Let us pick . Using Corollary 3.2,
completing the proof. ∎
Proof of Lemma 4.3.
Let us select a value such that
We claim that the normalized duality gap decreases to such value within iterations. The step-lengths for the affine scaling and LLS steps are stated in Proposition 3.4 and Lemma 3.11d.Whenever the algorithm chooses an LLS step, . Thus, the progress in will be at least as much (in fact, much better) than the guarantee we use for the affine scaling step in Proposition 3.4.
Let be the central path element corresponding to , and let . From now on we use the shorthand notation
We first show that for all such , and therefore, by choice of . According to Corollary 3.2, and recalling the definition , we see that
Thus,
Using the near-monotonicity of the central path (LABEL:lem:central_path_bounded_l1_norm), we have and . Together with our assumptions and , we see that
Using the assumption of the lemma, we see that follows by the choice of .
Next, consider any . From the triangle inequality Lemma 2.20b it follows that which gives From the bound , we get that
We next show that if , then . The case follows similarly.
Consider any with the corresponding central path point . The proof is complete by showing . Recall that for central path elements, we have , and . Therefore
Using Proposition 3.1, Lemma 3.3 and the assumption , we have and
Using these bounds, we get
completing the proof. ∎
It remains to prove Lemma 4.4 and Lemma 4.5, addressing the more difficult case . It is useful to decompose the variables into two sets. We let
| (44) |
The assumption implies that for every layer , either or . The next two lemma describes the relations between and .
Lemma 6.1.
Let for , and assume and . For the next iterate , we have
-
For ,
-
For ,
-
If or , then
-
If and , then
Proof.
By construction of the LLS step, , by recalling that . Using the bound derived above, for we get
using also Corollary 3.2. We can write
The claimed bounds follow again using Corollary 3.2 with the choice .
To get the upper bound on , we use
We can upper bound the first term by as above, and the second term by again from Corollary 3.2.
Part b.
Analogously to a.
Part c.
Part d.
Follows by parts a and b, and by the lower bound on obtained from Lemma 3.11d. ∎
Proof of Lemma 4.4.
Without loss of generality, let ; thus, . The case and can be treated analogously.
By Lemma 3.11c, , and therefore Lemma 4.2 provides a such that . Using LABEL:lem:central_path_bounded_l1_norm, .
The final statement for any is also easy to see. From Lemma 6.1c and the strong connectivity of in , we obtain that is strongly connected in . Hence, follows by Lemma 4.1.
The rest of the proof is dedicated to showing the existence of an such that . For this purpose, we will prove that
| (45) |
Thus, the lemma follows immediately from (45): for at least one , we must have .
Using the triangle inequality, we get
| (46) |
We bound the two terms separately, starting with an upper bound on . Since , we have that
where the last inequality follows by LABEL:lem:central_path_bounded_l1_norm, Corollary 3.2. and . We can use this and Lemma 6.1b to obtain
| (47) |
using the definition of .
The first term in (46) will be bounded as follows.
Claim 6.2.
.
Before proving the claim, we show how the inequality (45), and thus, the lemma, follow. Using Lemma 3.11d,
Proof of creftypecap 6.2.
We recall the characterization (24) of the LLS step . Namely, there exists that is the unique solution to . From the above, note that
From the Cauchy-Schwarz inequality,
| (48) | ||||
Here, we used that and . Note that
Therefore,
By Lemma 5.2, there exists such that . Therefore, using the orthogonality of and , we get that
From the above inequalities, we see that
The claim follows by showing . From Lemma 3.11d, we obtain
The claim follows by the assumption , and the choice of . ∎
As pointed out at the beginning of the proof of the claim, this finishes the proof of the lemma. ∎
Proof of Lemma 4.5.
Recall the sets and defined in (44). The key is to show the existence of an edge
| (49) |
Before proving the existence of such and , we show how the rest of the statements follow, namely, that , and for each , . According to Lemma 4.1, the latter is true (even with the stronger bound ) whenever , or , or if and . It is left to show the lower bound on and for and .
From Lemma 6.1c, we have that if or , then . Hence, the strong connectivity of and in implies the strong connectivity of these sets in . Together with the edge , we see that every can reach every on a directed path of length in . Applying Lemma 4.1 for this setting, we obtain for all such pairs, and also .
The rest of the proof is dedicated to showing the existence of and as in (49). We let such that . To simplify the notation, we let .
When constructing In Layering(), the subroutine Verify-Lift() was called for the set , with the answer ‘pass’. Besides , this guaranteed the stronger property that for the matrix implementing the lift (see Remark 2.12).
Let us recall how this matrix was obtained. The subroutine starts by finding a minimal such that . Recall that and for every .
Consider the optimal lifting . We defined as the matrix sending any to the corresponding vector . The column can be computed as for .
We consider the transformation
This maps .
Let be the singular vector corresponding to the maximum singular value of , namely, . Let us normalize such that . Thus,
Let us now apply to . Since is the minimum-norm lift operator, we see that
We can upper bound the operator norm by the Frobenius norm , and therefore
Let us fix and as the indices giving the maximum value of . Note that .
Let us now use Lemma 2.10 for the pair , the matrix and the subspace . Noting that , we obtain . Now,
| (50) |
The next claim finishes the proof.
Claim 6.3.
For and selected as above, (49) holds.
Proof.
holds by (50). From the above, we have
According to Remark 2.12, follows since Verify-Lift() returned with ‘pass’. We thus have
Lemma 6.1 excludes the scenarios , , and , , leaving and as the only possibility. Therefore, and . We have since and . ∎
7 Initialization
Our main algorithm (Algorithm 2 in Section 3.5), requires an initial solution . In this section, we remove this assumption by adapting the initialization method of [VY96] to our setting.
We use the “big- method”, a standard initialization approach for path-following interior point methods that introduces an auxiliary system whose optimal solutions map back to the optimal solutions of the original system. The primal-dual system we consider is
| (Init-LP) |
The constraint matrix used in this system is
The next lemma, asserts that the condition number of is not much bigger than that of of the original system (LP).
Lemma 7.1 ([Vy96, Lemma 23]).
We extend this bound for .
Lemma 7.2.
.
Proof.
Let and let the matrix consisting of three copies of , i.e.
| Then | ||||
Row-scaling does not change as the kernel of the matrix remains unchanged. Thus, we can rescale the last rows of , to the identity matrix, i.e. multiplying by from the left hand side. So, we observe that
where the inequality follows from Lemma 7.1. The lemma now readily follows as
Also, for sufficiently large , the optimal solutions of the original system are preserved. We let be the min-norm solution to , i.e., .
Proposition 7.3.
Proof.
If system (LP) is feasible, it admits a basic optimal solution with basis such that and Using Proposition 2.1b we see that
| (51) | ||||
| and using that we observe | ||||
| (52) | ||||
We can extend this solution to a solution of system (Init-LP) via setting and . Observe that and by (51) and (52). Furthermore observe that by complementary slackness this extended solution for (Init-LP) is an optimal solution. The property that immediately tells us that vanishes for all optimal solutions of (Init-LP) and thus all optimal solutions of (LP) coincide with the optimal solutions of (Init-LP), with the auxiliary variables removed. ∎
The next lemma is from [MT03, Lemma 4.4]. Recall that if .
Lemma 7.4.
Let , and let . Assume that . Then and .
The new system has the advantage that we can easily initialize the system with a feasible solution in close proximity to central path:
Proposition 7.5.
We can initialize system (Init-LP) close to the central path with initial solution and parameter if .
Proof.
Detecting infeasibility
For using the extended system (Init-LP), we still need to assume that both the primal and dual programs in (LP) are feasible. For arbitrary instances, we first need to check if this is the case, or conclude that the primal or the dual (or both) are infeasible.
This can be done by employing a two-phase method. The first phase decides feasibility by running (Init-LP) with data and . The objective value of the optimal primal-dual pair is 0 if and only if (LP) has a feasible solution. If the optimal primal/dual solution has positive objective value, we can extract an infeasibility certificate in the following way.
By the characterization of as in Proposition 2.1b, there exists an optimal solution with , and so by strong duality, . From the dual, we conclude that , and therefore . On the other hand, by assumption the objective value of the dual is positive, and so .
Feasibility of the dual of (LP) can be decided by running (Init-LP) on data and with the same argumentation: Either the objective of the dual is 0 and therefore the dual optimal solution corresponds to a feasible dual solution of (LP) or the objective value is negative and we extract a dual infeasibility certificate in the following way: By assumption . Furthermore, there exists a basic optimal solution to the dual of (Init-LP) with and therefore for the optimal primal solution . So, we have , together with yielding the certificate.
Finding the right value of
Whereas Algorithm 2 does not require any estimate on or , for the initialization we need to set as in Proposition 7.3.
A straightforward guessing approach (attributed to J. Renegar in [VY96]) starts with a constant guess, say , constructs the extended system, and runs the algorithm. In case the optimal solution to the extended system does not map to an optimal solution of (LP), we restart with and try again; we continue squaring the guess until an optimal solution is found.
This would still require a series of guesses, and thus, result in a dependence on in the running time. However, if we initially rescale our system using the near-optimal rescaling Theorem 2.5, the we can turn the dependence from to . The overall iteration complexity remains , since the running time for the final guess on dominates the total running time of all previous computations due to the repeated squaring.
An alternative approach, that does not rescale the system, is to use Theorem 2.5 to approximate . In this case we repeatedly square a guess of instead of which takes iterations until our guess corresponds to a valid upper bound for .
Note that either guessing technique can handle bad guesses gracefully. For the first phase, if neither a feasible solution to (LP) is returned nor a Farkas’ certificate can be extracted, we have proof that the guess was too low by the above paragraph. Similarly, in phase two, when feasibility was decided in the affirmative for primal and dual, an optimal solution to (Init-LP) that corresponds to an infeasible solution to (LP) serves as a certificate that another squaring of the guess is necessary.
References
- [ABGJ18] Xavier Allamigeon, Pascal Benchimol, Stéphane Gaubert, and Michael Joswig. Log-barrier interior point methods are not strongly polynomial. SIAM Journal on Applied Algebra and Geometry, 2(1):140–178, 2018.
- [AMO93] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc., 1993.
- [BE14] Sébastien Bubeck and Ronen Eldan. The entropic barrier: a simple and optimal universal self-concordant barrier. arXiv preprint arXiv:1412.1587, 2014.
- [Chu14] Sergei Chubanov. A polynomial algorithm for linear optimization which is strongly polynomial under certain conditions on optimal solutions. http://www.optimization-online.org/DB_HTML/2014/12/4710.html, 2014.
- [CLS19] Michael B Cohen, Yin Tat Lee, and Zhao Song. Solving linear programs in the current matrix multiplication time. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 938–942, 2019.
- [Dik67] II Dikin. Iterative solution of problems of linear and quadratic programming. Doklady Akademii Nauk, 174(4):747–748, 1967.
- [DLHL15] Jesús A De Loera, Raymond Hemmecke, and Jon Lee. On augmentation algorithms for linear and integer-linear programming: From edmonds–karp to bland and beyond. SIAM Journal on Optimization, 25(4):2494–2511, 2015.
- [DLKS19] Jesús A De Loera, Sean Kafer, and Laura Sanità. Pivot rules for circuit-augmentation algorithms in linear optimization. arXiv preprint arXiv:1909.12863, 2019.
- [DS08] Samuel I Daitch and Daniel A Spielman. Faster approximate lossy generalized flow via interior point algorithms. In Proceedings of the 40th annual ACM symposium on Theory of Computing, pages 451–460, 2008.
- [Fra11] András Frank. Connections in Combinatorial Optimization. Number 38 in Oxford Lecture Series in Mathematics and its Applications. Oxford University Press, 2011.
- [GL97] Clovis C. Gonzaga and Hugo J. Lara. A note on properties of condition numbers. Linear Algebra and its Applications, 261(1):269 – 273, 1997.
- [Gof80] Jean-Louis Goffin. The relaxation method for solving systems of linear inequalities. Mathematics of Operations Research, 5(3):388–414, 1980.
- [Gon92] Clovis C Gonzaga. Path-following methods for linear programming. SIAM review, 34(2):167–224, 1992.
- [HT02] Jackie CK Ho and Levent Tunçel. Reconciliation of various complexity and condition measures for linear programming problems and a generalization of Tardos’ theorem. In Foundations of Computational Mathematics, pages 93–147. World Scientific, 2002.
- [Kar84] Narendra Karmarkar. A new polynomial-time algorithm for linear programming. In Proceedings of the 16th Annual ACM Symposium on Theory of Computing (STOC), pages 302–311, 1984.
- [Kha79] Leonid G Khachiyan. A polynomial algorithm in linear programming. In Doklady Academii Nauk SSSR, volume 244, pages 1093–1096, 1979.
- [KM13] Tomonari Kitahara and Shinji Mizuno. A bound for the number of different basic solutions generated by the simplex method. Mathematical Programming, 137(1-2):579–586, 2013.
- [KOT13] Satoshi Kakihara, Atsumi Ohara Ohara, and Takashi Tsuchiya. Information geometry and interior-point algorithms in semidefinite programs and symmetric cone programs. Journal of Optimization Theory and Applications, 157:749–780, 2013.
- [KOT14] Satoshi Kakihara, Atsumi Ohara Ohara, and Takashi Tsuchiya. Curvature integrals and iteration complexities in SDP and symmetric cone programs. Computational Optimization and Applications, 57:623–665, 2014.
- [KT13] Tomonari Kitahara and Takashi Tsuchiya. A simple variant of the Mizuno–Todd–Ye predictor-corrector algorithm and its objective-function-free complexity. SIAM Journal on Optimization, 23(3):1890–1903, 2013.
- [LMT09] Guanghui Lan, Renato DC Monteiro, and Takashi Tsuchiya. A polynomial predictor-corrector trust-region algorithm for linear programming. SIAM Journal on Optimization, 19(4):1918–1946, 2009.
- [LS14] Yin Tat Lee and Aaron Sidford. Path finding methods for linear programming: Solving linear programs in iterations and faster algorithms for maximum flow. In Proceedings of the 55th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 424–433, 2014.
- [LS15] Yin Tat Lee and Aaron Sidford. Efficient inverse maintenance and faster algorithms for linear programming. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 230–249, 2015.
- [LS19] Yin Tat Lee and Aaron Sidford. Solving linear programs with linear system solves. arXiv preprint 1910.08033, 2019.
- [Mad13] Aleksander Madry. Navigating central path with electrical flows: From flows to matchings, and back. In Proceedings of the 54th IEEE Annual Symposium on Foundations of Computer Science, pages 253–262. IEEE, 2013.
- [Meg83] Nimrod Megiddo. Towards a genuinely polynomial algorithm for linear programming. SIAM Journal on Computing, 12(2):347–353, 1983.
- [Meh92] Sanjay Mehrotra. On the implementation of a primal-dual interior point method. SIAM Journal on Optimization, 2(4):575–601, 1992.
- [MMT98] Nimrod Megiddo, Shinji Mizuno, and Takashi Tsuchiya. A modified layered-step interior-point algorithm for linear programming. Mathematical Programming, 82(3):339–355, 1998.
- [MT03] Renato D. C. Monteiro and Takashi Tsuchiya. A variant of the Vavasis-Ye layered-step interior-point algorithm for linear programming. SIAM Journal on Optimization, 13(4):1054–1079, 2003.
- [MT05] Renato D. C. Monteiro and Takashi Tsuchiya. A new iteration-complexity bound for the MTY predictor-corrector algorithm. SIAM Journal on Optimization, 15(2):319–347, 2005.
- [MT08] Renato D.C. Monteiro and Takashi Tsuchiya. A strong bound on the integral of the central path curvature and its relationship with the iteration-complexity of primal-dual path-following LP algorithms. Mathematical Programming, 115(1):105–149, 2008.
- [MTY93] Shinji Mizuno, Michael Todd, and Yinyu Ye. On adaptive-step primal-dual interior-point algorithms for linear programming. Mathematics of Operations Research - MOR, 18:964–981, 11 1993.
- [O’L90] Dianne P. O’Leary. On bounds for scaled projections and pseudoinverses. Linear Algebra and its Applications, 132:115–117, April 1990.
- [OV17] Neil Olver and László A. Végh. A simpler and faster strongly polynomial algorithm for generalized flow maximization. In Proceedings of the Forty-Ninth Annual ACM Symposium on Theory of Computing (STOC), pages 100–111, 2017.
- [Ren88] James Renegar. A polynomial-time algorithm, based on Newton’s method, for linear programming. Mathematical Programming, 40(1-3):59–93, 1988.
- [Ren94] James Renegar. Is it possible to know a problem instance is ill-posed?: some foundations for a general theory of condition numbers. Journal of Complexity, 10(1):1–56, 1994.
- [Ren95] James Renegar. Incorporating condition measures into the complexity theory of linear programming. SIAM Journal on Optimization, 5(3):506–524, 1995.
- [Sch03] Alexander Schrijver. Combinatorial Optimization – Polyhedra and Efficiency. Springer, 2003.
- [SSZ91] György Sonnevend, Josef Stoer, and Gongyun Zhao. On the complexity of following the central path of linear programs by linear extrapolation II. Mathematical Programming, 52(1-3):527–553, 1991.
- [ST04] Daniel A Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), 2004.
- [Ste89] G.W. Stewart. On scaled projections and pseudoinverses. Linear Algebra and its Applications, 112:189 – 193, 1989.
- [Tar85] Éva Tardos. A strongly polynomial minimum cost circulation algorithm. Combinatorica, 5(3):247–255, Sep 1985.
- [Tar86] Éva Tardos. A strongly polynomial algorithm to solve combinatorial linear programs. Operations Research, pages 250–256, 1986.
- [Tod90] Michael J. Todd. A Dantzig–Wolfe-like variant of Karmarkar’s interior-point linear programming algorithm. Operations Research, 38(6):1006–1018, 1990.
- [TTY01] Michael J. Todd, Levent Tunçel, and Yinyu Ye. Characterizations, bounds, and probabilistic analysis of two complexity measures for linear programming problems. Mathematical Programming, 90(1):59–69, Mar 2001.
- [Tun99] Levent Tunçel. Approximating the complexity measure of Vavasis-Ye algorithm is NP-hard. Mathematical Programming, 86(1):219–223, Sep 1999.
- [Vai89] Pravin M Vaidya. Speeding-up linear programming using fast matrix multiplication. In Proceedings of the 30th IEEE Annual Symposium on Foundations of Computer Science, pages 332–337, 1989.
- [Vav94] Stephen A Vavasis. Stable numerical algorithms for equilibrium systems. SIAM Journal on Matrix Analysis and Applications, 15(4):1108–1131, 1994.
- [vdB20] Jan van den Brand. A deterministic linear program solver in current matrix multiplication time. In Proceedings of the Symposium on Discrete Algorithms (SODA), 2020. To appear. Available at https://arxiv.org/abs/1910.11957.
- [Vég17] László A. Végh. A strongly polynomial algorithm for generalized flow maximization. Mathematics of Operations Research, 42(2):179–211, 2017.
- [VW12] Virginia Vassilevska Williams. Multiplying matrices faster than coppersmith-winograd. In Proceedings of the 44th annual ACM symposium on Theory of computing, pages 887–898, 2012.
- [VY96] Stephen A. Vavasis and Yinyu Ye. A primal-dual interior point method whose running time depends only on the constraint matrix. Mathematical Programming, 74(1):79–120, 1996.
- [Ye97] Yinyu Ye. Interior-Point Algorithms: Theory and Analysis. John Wiley and Sons, New York, 1997.
- [Ye05] Yinyu Ye. A new complexity result on solving the Markov decision problem. Mathematics of Operations Research, 30(3):733–749, 2005.
- [Ye06] Yinyu Ye. Improved complexity results on solving real-number linear feasibility problems. Mathematical Programming, 106(2):339–363, April 2006.
- [Ye11] Yinyu Ye. The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Mathematics of Operations Research, 36(4):593–603, 2011.